Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0644020200330020051
Journal Of Korean Medical Classics
2020 Volume.33 No. 2 p.51 ~ p.59
A Comparative Study of Feature Extraction Methods for Authorship Attribution in the Text of Traditional East Asian Medicine with a Focus on Function Words
Oh Jun-Ho

Abstract
Objectives : We would like to study what is the most appropriate "feature" to effectively perform authorship attribution of the text of Traditional East Asian Medicine

Methods : The authorship attribution performance of the Support Vector Machine (SVM) was compared by cross validation, depending on whether the function words or content words, single word or collocations, and IDF weights were applied or not, using ¡®Variorum of the Nanjing¡¯ as an experimental Corpus.

Results : When using the combination of 'function words/uni-bigram/TF', the performance was best with accuracy of 0.732, and the combination of 'content words/unigram/TFIDF' showed the lowest accuracy of 0.351.

Conclusions : This shows the following facts from the authorship attribution of the text of East Asian traditional medicine. First, function words play an important role in comparison to content words. Second, collocations was relatively important in content words, but single words have more important meanings in function words. Third, unlike general text analysis, IDF weighting resulted in worse performance.
KEYWORD
authorship attribution, Function words, Korean Medical Classics, East Asian traditional medicine, Variorum of the Nanjing
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)